Automatic annotation of multilingual text collections with a conceptual thesaurus

نویسندگان

  • Bruno Pouliquen
  • Ralf Steinberger
  • Camelia Ignat
چکیده

Automatic annotation of documents with controlled vocabulary terms (descriptors) from a conceptual thesaurus is not only useful for document indexing and retrieval. The mapping of texts onto the same thesaurus furthermore allows to establish links between similar documents. This is also a substantial requirement of the Semantic Web. This paper presents an almost language-independent system that maps documents written in different languages onto the same multilingual conceptual thesaurus, EUROVOC. Conceptual thesauri differ from Natural Language Thesauri in that they consist of relatively small controlled lists of words or phrases with a rather abstract meaning. To automatically identify which thesaurus descriptors describe the contents of a document best, we developed a statistical, associative system that is trained on texts that have previously been indexed manually. In addition to describing the large number of empirically optimised parameters of the fully functional application, we present the performance of the software according to a human evaluation by professional indexers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Concept Propagation Based on Visual Similarity Application to Medical Image Annotation

This paper presents an approach for image annotation propagation to images which have no annotations. In some specific domains, the assumption that visual similarity implies (partial) semantic similarity can be made. For instance, in medical imaging, two images of the same anatomic part in a given modality have a very similar appearance. In the proposed approach, a conceptual indexing phase ext...

متن کامل

Multilingual Ontology Enrichment for Semantic Annotation and Retrieval of Medical Information

Background: Knowledge management in the European project Noesis addresses concept-based annotation and multilingual Information Retrieval of documents. Objective: Multilingual enrichment of a concept-based terminology in the medical field. Experience and evaluation in the domain of cardiovascular diseases by enriching a subset of the MeSH thesaurus in six European languages. This terminology, r...

متن کامل

Conceptual Database Retrieval through Multilingual Thesauri

In traditional database management systems, information retrieval is often carried out using keywords contained within fields of each record. Because a term (concept) can be expressed in several ways, a significant number of records are ignored by the free text techniques which use only a posteriori relations between terms. This paper proposes the utilisation of a priori conceptual relations be...

متن کامل

Using Thesauri for Automatic Indexing and for the Visualisation of Multilingual Document Collections

This article presents an approach for cross-language document comparison and for the visualisation of multilingual document collections. Document comparison usually relies on the calculation of the degree of lexical overlap between documents. As this is not possible for documents written in different languages, the contents of these documents first have to be mapped onto a language-independent ...

متن کامل

Supporting Semantic Image Annotation and Search

In this article we discuss an application scenario for semantic annotation and search in a collection of art images. This application shows that background knowledge in the form of ontologies can be used to support indexing and search in image collections. The underlying ontologies are represented in RDF Schema and are based on existing data standards and knowledge corpora, such as the VRA Core...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/cs/0609059  شماره 

صفحات  -

تاریخ انتشار 2003